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Abstract 

A contemporary and fundamental problem faced by many evolu- 
tionary biologists is how to puzzle together a collection V of partial 
trees (leaf-labelled trees whose leaves are bijectively labelled by species 
or, more generally, taxa, each supported by e. g. a gene) into an overall 
parental structure that displays all trees in V . This already difficult 
problem is complicated by the fact that the trees in V regularly sup- 
port conflicting phylogenetic relationships and are not on the same 
but only overlapping taxa sets. A desirable requirement on the sought 
after parental structure therefore is that it can accommodate the ob- 
served conflicts. Phylogenetic networks are a popular tool capable 
of doing precisely this. However, not much is known about how to 
construct such networks from partial trees, a notable exception being 
the ^-closure super-network approach and the recently introduced Q- 
imputation approach. Here, we propose the usage of closure rules to 
obtain such a network. In particular, we introduce the novel y-closure 
rule and show that this rule on its own or in combination with one 
of Meacham's closure rules (which we call the M-rule) has some very 
desirable theoretical properties. In addition, we use the M- and F-rule 
to explore the dependency of Rivera et al. 's "ring of life" on the fact 
that the underpinning phylogenetic trees are all on the same data set. 
Our analysis culminates in the presentation of a collection of induced 
subtrees from which this ring can be reconstructed. 

1 Introduction 

Phylogenetic trees have proved an important tool for representing evolution- 
ary relationships. For a set X of species (or, more generally, taxa) these are 
formally denned as leaf-labelled trees whose leaves are bijectively labelled 
by the elements of X. Advances in DNA sequencing have resulted in ever 
more data on which such trees may be based. Computational limitations 
however combined with the need to understand species evolution have left 
biologists with the following fundamental problem which we will refer to as 
amalgamation problem: given a collection V of phylogenetic trees, how can 
these trees be amalgamated into an overall parental structure that preserves 
the phylogenetic relationships supported by the trees in VI The hope is that 
such a structure might help shed light on the evolution of the underlying 
genomes (and thus the species). 

In the ideal case that all trees in V support the same phylogenetic rela- 
tionships (as is the case for trees T\ and T2 depicted in Fig. [[]) this structure 
is known to be a phylogenetic tree and a supertree method [2] may be used 
to reconstruct it. For the above example the outcome T* of such a method is 
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A.ar A.haA.th A.ar A. ha A.th Tu A.ar A.ce 

>K >7^ 

A.su Tu A.su Cru Cru^ gu ^ ^ A.ha 

Figure 1: 3 phylogenetic trees which appeared in weighted form in [T7] on 
subsets of the 7 plant species: A.thaliana (A.th), A.suecia (A.su), Turri- 
tis (Tu) A.arenosa (A.ar), A.cebennensis (A.ce), Crucihimalaya (Cru) and 
A.halleri (A.ha). 



T\ with species Cru (see Fig. [Qfor full species names) attached via a pendant 
edge to the vertex labelled v. It should be noted that T* supports the same 
phylogenetic relationships as T\ and T<i in the following sense: For a finite 
set X, call a bipartition S = {A, A} of some subset X' C X a partial split on 
X, or a partial (X)-split for short, and denote it by A\A or, equivalently, by 
A\A where A := X' — A. In particular, call S a (full) split of X if X' = X. 
Furthermore, say that a partial X-split S = A\A extends a partial X-split 
S' = B\B if either B C A and B C i or B C 1 and A C B. Finally, say 
that a phylogenetic tree T displays a split S 1 = A|j4 if S is a partial split on 
the leaf set C(T) of T induced by deleting an edge of T. Then "supports 
the same phylogenetic relationships" means that for every split S displayed 
by T\ or T 2 there exists a split on C{T*) that extends S and is displayed by 

Due to complex evolutionary mechanisms such as incomplete lineage 
sorting, recombination (in the case of viruses), or lateral gene transfer (in 
case of bacteria) the trees in V may however support not the same but 
conflicting phylogenetic relationships. A phylogenetic network in the form 
of a split network (see |10} [T9] for overviews) rather than a phylogenetic tree 
is therefore the structure of choice if one wishes to simultaneously represent 
all phylogenetic relationships supported by the trees in V . An example in 
point is the split network pictured in Fig. [2] which appeared as a weighted 
network in [17]. With replacing "edge" in the definition of displaying by 
"band of parallel edges" and "£(T)" by "set of network vertices of degree 
1" to obtain a definition for when a split network displays a split, it is straight 
forward to check that the network in Fig. [2] displays all splits displayed by 
the 3 trees pictured in Fig. CD 

It should be noted that phylogenetic networks such as the one depicted in 
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Fig. [2] (see e.g. [TJ HH Q3] for recently introduced other types of phylogenetic 
networks) provide a means to visualize the complexity of a data set and 
should not be thought of as an explicit model of evolution. Awareness of 
this complexity does not only allow the exploration of a data set but, as is 
the case of e.g. hybridization networks [2], can also serve as starting point 
for obtaining an explicit model of evolution (see |12j for more on this). 
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Figure 2: A circular phylogenetic network that represents all phylogenetic 
relationships supported by the trees depicted in Fig. Q] (see that figure for 
full species names). 

Apart from displaying all splits induced by the 3 trees depicted in Fig. [Q 
the network depicted in Fig. [2] has a further interesting feature. It is cir- 
cular. In other words, if X denotes the set of the 7 plant species under 
consideration, then the elements of X can be arranged around a circle C so 
that every split S = A\A of X displayed by the network can be obtained by 
intersecting C with a straight line so that the label set of one of the resulting 
2 connected components is A and the label set of the other is A. 

Although seemingly a very special type of phylogenetic network, circular 
phylogenetic networks are a frequently used structure in phylogenetics (see 
e. g. [3 [U El [8]) as they do not only naturally generalize the concept of a 
phylogenetic tree but are also guaranteed to be representable in the plane; a 
fact that greatly facilitates drawing and thus analyzing them. However, al- 
though recently first steps have been made with regards to finding a solution 
to the amalgamation problem in terms of a phylogenetic network leading to 
the attractive Z-closure [13] and Q-imputation [9] approaches, very little is 
known about a solution of this problem in terms of a circular phylogenetic 
network. 

Intrigued by this and motivated by the fact that, from a combinatorial 
point of view, phylogenetic trees and networks are split systems (i. e. collections 
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of full splits) and that therefore the amalgamation problem boils down to 
the problem of how to extend partial splits on some set X to splits on X, we 
wondered whether closure rules for partial splits could not be of help. Es- 
sentially mechanisms for splits' enlargement, such rules have proved useful 
for supertree construction and also underpin the above mentioned Z-closure 
super-network approach. As it turns out, this is indeed the case. As an 
immediate consequence of our main result (Corollary 15. 5p . we obtain that 
for a collection of partial splits that can be "displayed" by a circular phy- 
logenetic network J\f, the collection of (full) splits generated by the closure 
rules in the centre of this paper is guaranteed to be displayable by M and 
also independent of the order in which the rules are applied. 

In a study aimed at shedding light into the origin of eukaryotes, Rivera et 
al. [20] put forward the idea of a "ring of life" with the eukaryotic genome 
being the result of a fusion of two diverse procaryotic genomes (see also 
[161 [20} 123]). A natural and interesting question in this context is how 
dependent Rivera et al.'s ring of life is on the fact that all underpinning 
trees are on the same taxa set. In the last section of this paper, we provide 
a partial answer by presenting an example of a collection of induced partial 
trees from which the ring of life can be reconstructed using the M- and 
Y-rule. 

The paper is organized as follows. In Section [21 we first introduce some 
more terminology and then restate one of Meacham's closure rules (our M- 
rule) and introduce the novel Y-rule. In Section [31 we study the relationship 
between the M- and Y-rule and the closure rule that underpins the afore- 
mentioned Z-closure super-network approach. In Section 01 we introduce 
the concept of a circular collection of partial splits and show that both the 
Y- and M-rule preserve circularity (Proposition 14. 4| ). In Section [5j we in- 
troduce the concept of a split closure and show that for certain collections 
of partial splits this closure is independent of the order in which the Y-rule 
and/or M-rule are/is applied (Theorem 15 .3() . This result lies at the heart 
of Corollary 15.51 In Section [U we explore the dependency of Rivera et al.'s 
ring of life on the fact that the underpinning trees are all on the same data 
set 

Throughout the paper, X denotes a finite set and the terminology and 
notation largely follows [21]. 
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2 Closure rules 



We start this section by introducing some additional terminology and nota- 
tion. Subsequent to this, we first restate Meacham's rule (which we call the 
M-rule) and then introduce a novel closure rule which we call the Y-rule. 

Let denote the collection of all partial splits of X and suppose 

£ C E(X). Then a partial split S E £ that can be extended by a partial 
split S' £ £ — S is called redundant. The set obtained by removing redundant 
elements from £ is denoted by £~. If £ = £~ then £ is called irreducible 
and the set of all irreducible subsets in £(X) is denoted by V{X). Note that 
the relation defined for any two (partial) split collections £, £' € P(^) 
by putting £ X X' if every partial split in £ is extended by a partial split in 
£' is a partial order on V(X) |21j . 

Suppose for the following that is a closure rule, that is, a replacement 
rule that replaces a collection A Q £(-X") of partial splits that satisfy some 
condition Cg by a collection #(.4) Q X(X) whose elements are generated 
in some systematic way from the partial splits in A (see e.g. the M- and 
the Y-closure rules presented below for two such systematic ways). Suppose 
£, £' £ "P(X) are two irreducible collections of partial splits and Ce(£) is the 
set of all subsets of £ that satisfy Cg. If there exists some subset A G C^E) 
such that £' = (E U 6{A))~ then we say that £' is obtained from E wa 
a single application of 6. Finally, if for every subset A S C#(E) we have 
9(A.)~ di E then we call an application of 9 to E trivial and say that E is 
closed with respect to 9. 

We are now in the position to present the 2 closure rules we are mostly 
concerned with in this paper: the M-rule which is originally due to Meacham 
[18j and the novel Y-rule. We start with Meacham's rule. 

2.1 The M-rule 

Suppose S\, S2 E E(X) are two distinct partial splits of X. Then the M-rule 
9m is as follows: 

(9m) If there exists some G Si, i = 1, 2 such that 

Ai n A 2 / and Ai n M / (1) 

then replace 4 = {S^S^} by the set #j^ 1,A2 ^(.4) which comprises of 
A and, in addition, also the partial splits 

S[ = (Ai n A 2 )\(Ai U A 2 ) and S£ = (X x D )pi U A 2 ). 
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In case the partial splits S\ and S 2 are such that there is no ambiguity with 
regards to the identity of the sets A\ and A2 in the statement of the M-rule or 
they are irrelevant to the discussion, we will simplify e {Al ' M} {A) to M (A). 
Clearly, such ambiguity cannot arise if Si and S2 are compatible, that is, 
there exist subsets Di G Si, i = 1,2 such that D± D D 2 = 0. However if S\ 
and 52 are incompatible, that is, not compatible then caution is required. 

Note that if A\ and Ai as in the statement of the M-rule are such that 
Ai C A\ and A\ C A2, then it is easy to verify that 6m applies trivially 
to A. Also note that for any £ € V{X) and any two distinct partial splits 
S\, S2 € S, we have 

S ^ (SU0 M ({5i,5 2 }))-. 

Finally, note that any phylogenetic tree on X that displays the partial splits 
in some set E € V(X) also displays the partial splits in (EU#m({Si, £2}))^ 
Si, ft G S. 

2.2 The F-rule 

Suppose Si G E(X), i = 1,2,3, are three distinct partial splits of X. Then 
the Y -rule 9y is as follows: 

(6y) If there exists some Ai E Si, i = 1, 2, 3 such that 

^ {Ai ni 2 n A 3 , iini 2 n A 3 ,Ai nA 2 n A s } and 
AinA2nA3 = 0. (2) 

(see Fig.[3la) for a graphical interpretation), then replace A = {Si, S2, S3} 
by the set O^ 1 ^ 2 '^ (A) which comprises of the partial splits 

S[ = Ai\j(A 2 nA 3 )\Ai,S'2 = A 2 U(A 1 n A 3 )\A 2 , and 
S' 3 = A 3 U(AinA 2 )\A 3 . 

Although the condition in (|2|) might look quite strange at first sight, the 
class of triplets of partial splits that satisfy it is very rich. For example, sup- 
pose that Si = Ai\Ai, i = 1, 2,3 are splits of X that can be arranged in the 
plane as indicated in Fig. [3^b) where each bold, straight line represents one 
of Si, i = 1,2,3 and the dots represent non-empty triplewise intersections 
of the parts of Si, i = 1,2, 3, in which they lie. For example, the dot in the 
bottom wedge represents the intersection Ai n A2 H A 3 . The shaded regions 
correspond to the 3 non-empty intersections mentioned in the statement of 
the y-rule. The partial splits S^ = A'^A^ i = 1,2,3 obtained by restricting 
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A x n A 2 n A 3 / Ai n A 2 n A 3 ^ 
Ai n A2 n A 3 = 



A x n A 2 n A 3 ^ 

(a) 

Figure 3: (a) A graphical representation of Condition ([2j) in the form of a Y. 
(b) An example of three splits, depicted in bold lines, that satisfy Condition 
([2]) - see text for details. 




£2, and S3 to different subsets of X so that the shaded regions remain 
non-empty form a triplet of partial splits that satisfy ([2j). 

As the example of set A comprising the three partial splits S\ = 145 1 2367, 
5*2 = 1357| 246, and S3 = 127(356 shows different choices of the sets A,, 
i = 1,2,3 lead to different sets e^ uAa,Aa} (A). For example, if A x := 
{1,4,5}, A 2 := {1,3,5,7}, and A 3 := {1,2,7} then fl2]) is satisfied and 
e^ Al ' A2 ' As} (A) = {Si,5 2 ,1247|356}. If however Ai and A 2 are as before 
and A 3 := {3,5,6}, then ^ is also satisfied and 0^ M > A ^ (A) is the 
set {Si, 5*2, 127[3456}. Following our practise for the M-rule, for A = 
{Si,S2,Sa} we simplify O^ 1 ^ 2 '^ {A) to 6y{A) if the partial splits S«, 
i = 1,2,3 are such that there is no ambiguity with regards to the iden- 
tity of the sets Aj, i = 1,2,3, in the statement of the K-rule or they are 
irrelevant to the discussion. 

Note that if Aj, i = 1,2,3 as in the statement of the y-rule are such 
that,Jn addition, / A x n A2 C A 3 , 7^ Ai n A3 C A 2 , / A3 n A2 C A^, 
and Ai n A2 n A3 7^ it is easy to see that 6y applies trivially to A. Also 
note that for any £ G V(X) and any 3 partial splits Si, S2, S3 G S of X, we 
have 

£^(£U0 y ({Si,S 2 ,S 3 }r. 
3 First closure rule relationships 

In this section we first restate the Z- (closure) rule which was used in [13] in 
the context of a supernetwork construction approach and then investigate 
the relationship between the Y-, M-, and Z-rule. 

Also originally due to Meacham [18], the Z-rule 9z can be restated as 
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follows: Suppose Si, £2 £ ^(X) are two distinct partial splits of X. 
(Bz) If there exists some A{ E Si, i = 1,2 such that 

^ {Ai n A 2 , A 2 PA\,A\ P A2} and j4i n A2 = (3) 

then replace ^4 = {S\,S2} by the set ^y(^4) which comprises of the 
partial splits (A 1 U A 2 )\A 1 and A 2 \(A 1 UA 2 ). 

Note that any two compatible partial splits of X satisfy the condition in §3§ . 

With this third closure rule at hand we are now in the position to present 
a first easy to verify result. Suppose Si, S 2 , and S3 are 3 distinct partial 
splits of X such that there exist parts Ai 6 Si, i = 1, 2, 3 as in the statement 
of the F-rule. If, injiddition,^4i n A 2 P A 3 / and A\ C A 2 U A3, then 
the partial split Ai|^4i U (A 2 Pi A3) generated by 9y is also generated by first 
applying 9m to S 2 and S3 (with regards to A 2 P A3 ^ 0) and then applying 
9m to the resulting partial split A 2 U .A3I.A2 H A3 and Si. 

In addition, we have the following result whose straight forward proof 
we leave to the reader. 

Proposition 3.1 Suppose S\ is a full split of X. Then the following state- 
ments hold. 

(i) If S 2 is a partial X-split and Oz applies to S = {Si,S2}, then 

(ii) If S 2 and S3 are partial X-splits so that 9y applies to S = {Si, S2, S3} 
and 9z applies to {Si,S2} and {Si, S3}. Then 

(M^)u |J e M (Si,s j )y = { |J e z (Si,Si))-. 

J'=2,3 ie{2,3} 

4 Closure rules and weakly compatible collections 
of partial splits 

In this section we introduce the notion of a weakly compatible collection 
of partial splits and study properties of the Y- and M-rules regarding such 
collections. A particular focus lies on the study of circular collections of 
partial splits which we also introduce. As we will see, they form a very rich 
subclass of such collections of partial splits. 
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4.1 Weakly compatible collections of partial splits 

We start this section with a definition that generalizes the concept of weak 
compatibility for (full) splits of X [1] to partial splits of X. Suppose S, = 
Ai\Ai G i = 1, 2, 3, are three partial X-splits. Then we call Si, S2, S3 

weakly compatible if at least one of the four intersections 

A x n A 2 n A 3 , Ai n A 2 n A 3 , M n A 2 n T 3 , A 1 n A 2 n 3^ (4) 

is emptjfj]. Since the roles of Ai and in Sj, i = 1,2, 3, can be interchanged 
without changing Si,S 2 , S3 we have that Si, S 2 , S3 are weakly compatible if 
and only if at least one of the four intersections 

Ai n A 2 n Ai n A 2 n AinA 2 n A 3 , n A 2 n yl 3 

is empty. More generally, we call a collection S C of partial X-splits 

weakly compatible if every three partial splits in £ are weakly compatible. To 
give an example, the partial splits Si = 123|4567, S 2 = 124|3567, and S 3 = 
235(146 are weakly compatible whereas the partial splits S3, S4 = 24| 135, 
and 55 = 21 1346 are not. Thus, {Si, . . . , S5} is not weakly compatible. Note 
that, like in the case of (full) splits, it is easy to see that any collection of 
pairwise compatible partial splits is also weakly compatible. 

Clearly any three partial splits Si = Ai\Ai 6 Y>(X), i = 1, 2, 3, for which 
precisely one of the four intersections in ([4]) is empty also satisfies Condition 
([2]). Thus By may be applied to Si,S2,S3. However, as the example of 
the set {127|3456, 1234(567, 235 1 146} shows, application of 9y to a weakly 
compatible collection of partial splits does not, in general, yield a weakly 
compatible collection of partial splits. Also it should be noted that 9m 
applied to a weakly compatible collection of partial splits does not always 
yield a weakly compatible collection of partial splits. 

However, the next result whose proof is straight forward holds. 

Lemma 4.1 Suppose S, £' C T,(X). //£' is weakly compatible and S ^ 
then £ must also be weakly compatible. 

4.2 Circular collections of partial splits 

We now turn our attention to the study of a special class of weakly compati- 
ble collections of partial splits called circular collections of partial splits. To 

1 In the definition of weak compatibility for full splits, Si, S2 and S3 are full splits and 
the condition in ((4]) is the same (see ff]). 
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be able to state their definition, we require some more terminology which 
we introduce next. 

A cycle C is a connected graph with |V(C)| > 3 and every vertex has 
degree 2. We call C an X -cycle if the vertex set of C is X. For x, G X 
(1 < i < n := \X\) and C an X-cycle, we call x\, x 2 , ■ ■ ■ , x n , x n+ % = x\ a 
vertex ordering (of C) if the edge set of C coincides with the set of all 2-sets 
{xi,x i+ i} of X, i = 1,. . . ,n. 

For a graph G = (V, E) and some subset E' of E, we denote by G— E' the 
graph obtained from G by deleting the edges in E ' . We say that a partial X- 
split A | ^4 is displayed by an X-cycle C if there exist two distinct edges e± and 
e 2 in C such that the vertex set of one of the two components of C — {e±, e2} 
contains A and the other one contains A. More generally, we say that a 
set £ C £(X) of partial splits is displayed by an X-cycle C if every partial 
split in £ is displayed by C. Finally, we say that a collection S C S(X) 
is circular if there exists some X-cycle C such that every partial split in E 
is displayed by C. Note that every split collection in T,(X) displayed by a 
circular phylogenetic network is circular. 

As is well-known, every circular split system is in particular weakly com- 
patible. The next result shows that an analogous result holds for collections 
of partial splits. 

Lemma 4.2 Suppose £ C T,(X). If £ is circular then £ is also weakly 
compatible. 

Proof: Suppose C is an X-cycle that displays £ but there exist three partial 
splits Si, 5*2, S3 E £ such that with Ai & Si, i = 1,2,3, playing the role of 
their namesakes in Q none of the four intersections in ([3]) is empty. Then 

51 and S2 are incompatible and, since S\ and S2 are displayed by C, there 
must exist edges ei,e' 1 ,e2,e' 2 € E{C) such that, for all i,j G {1,2} distinct, 
the vertex set of one component of C — {ei, e^} contains Ai U ej and the other 
contains Ai U e'y Since S3 is displayed by C and neither A\ n A 2 PI A3 nor 

Ai(~)A 2 ri A 3 , nor Ai n A 2 n A 3 is empty, it follows that AiDA 2 f]A 3 = 0, 
which is impossible. I 

As in the case of full splits, the converse of the above lemma is not true 
in general. For example, the set £ comprising the partial splits Si = 12|35, 

52 = 125|34, S3 = 13|245 and S4 = 135|24 is weakly compatible since the 
sets {Si,S2} and {S3,S4} are pairwise compatible. Yet, as can be easily 
checked, £ is not circular. 

Corresponding to Lemma 14. lj, we have: 
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Lemma 4.3 Suppose E, S' C E(X). 7/ E' zs displayed by an X-cycle C and 
E ^ E' ; i/ien E is also displayed by C. 

4.3 Circularity and the M- and F-rule 

As was noted earlier, neither the Y-rvle nor the M-rule preserve weak com- 
patibility in general. As the next result shows, the situation is different for 
the special case of circular collections of partial splits. 

Proposition 4.4 Suppose E, E' € V{X) and C is an X-cycle. If E' is 
obtained from E by a single application of either 9y or 6 m then E is displayed 
by C if and only ifY! is displayed by C. 

Proof: Suppose E, E' € V{X) and C is an X-cycle. We start the proof with 
noting that, regardless of whether E' is obtained from a single application 
of either 6y or 6 m to E, E is displayed by C whenever E' is displayed by C 
in view of Lemma 14.31 

Conversely, suppose that E is displayed by C. Assume first that E' is 
obtained from E by a single application of 6y . Let {Si, S2, S3} C E be the 
set to which By is applied. With Ai € Si, i = 1, 2, 3, playing the role of their 
namesakes in the statement of ([2]) , we may assume without loss of generality 
that none of the three intersections D\ = Ai n A2 H A3, D2 = A\ n A2 H A3, 
and D 3 = Ai D A 2 (~) A 3 is empty but that Ay nj 2 n A 3 = 0. It suffices to 
show that the partial split S = A3 U (A± n A2)|^43 is displayed by C. 

Clearly, if A\ n Ag = then S = S3 and, therefore, S is displayed by 
C. So assume A\ n A2 7^ 0. Then since by assumption Di ^ 0, i = 1,2,3, 
and Si and 52 are displayed by C, there must exist four distinct edges 
ei> ^i) ^2)62 ^ -^(C) such that, for all i,j G {1,2} distinct, one component 
of C — {ei, e'j} contains Ai in its vertex set and ej C Ai and the other contains 
Ai in its vertex set and C Ai. Without loss of generality, we may assume 
that X = {x\, . . . , x n }, n > 3, that x\, x%, . . . , x n is a vertex ordering of C, 
and that e\ = {x n ,x\}. Furthermore, we may also assume without loss of 
generality that the component of C — {e\, e^} that contains x\ in its vertex 
set also contains A\. Since D\ ^ 7^ D2, and S3 is displayed by C there 
must exist distinct paths P and P' in C such that either A3 C V(P) or 
A3 C V(P') (see Figured!). If C V(P) then 

= 2!nyl2nA! = D3 

which is impossible. Thus A3 C V(P') must hold. Suppose y,z £ V(C) are 
such that when starting at xi and traversing C clockwise y is contained in 
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A3 and the next vertex y' on C with y' € A3 U A3 is contained in A3 whereas 
z £ A3 and the next vertex z' on C with G A3 U A3 is contained in A3 . 




Figure 4: A schematic representation of the two alternative locations for 
A3 (cf proof of Proposition 14. 4p . The closed curve is the X-cycle C, the 
four curves with the short dashes represent the four non-empty intersections 
A\ fl A2, A\ n Aii A\V\ A2 and A\ n A2 (note that each of them can consist 
of more than one part), the rectangles mark the intersections D\ and D2, 
and the dotted and dashed curves represent the two paths P and P' on C 
on which A3 can lie. 

Let P" denote the path from z' to y (taken clockwise). Then e2 and 
e' x are edges on P" and so Ai D A2 C V(P"). The choice of y and z' 
implies V{P") n A3 = and A 3 U (Ai n A2) C y(P"). Hence, the split 
V(P")\X — V(P") which is displayed by C extends the partial split S. Thus 
C displays S. This concludes the proof in case the applied closure rule 
applied is 9y. 

To conclude the proof of the proposition suppose £' is obtained from E 
by a single application of 8m- Let {Si, C S be the set to which 6m is 
applied. With Aj E Si, i = 1,2, we may assume without loss of generality 
that Ai n A 2 ^ and A x n A 2 / 0. If 8 M applies trivially to S then £ = £' 
and so S' must be displayed by C. If #m does not apply trivially to £ it 
suffices to show that C displays (Ai n A2)|(Ai U A2). 

Since Si and S2 are displayed by C there must exist edges G E(C) 
such that the vertex set of one of the two components Pi, P[ of C — {e^, e'A- 
contains Aj and the other contains Aj, i = 1, 2. Put := |{ei, e^} (~1 {e2, e^H 
and note that < k < 2. Without loss of generality, we may assume 
A t C ^(Pi) and Aj C F(P/), < = 1, 2. Then, ± A x n A 2 C V(Pi) n F(P 2 )- 
Since V(Pi) fl V{P-i) is the vertex set of one of the 4 — k components of C 
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with the edges ei,e\, i = 1,2 removed, it follows that there must exist two 
distinct edges , among the edges e\ , e[ , e 2 , e' 2 so that the vertex set of 
one of the two components of C — {e%, e^} is V{Pi) n V(P2). Since 

x - (v{Pl) n v(p 2 )) = {x- v{P!)) u(x- v(p 2 )) = v(p{) u v(p^) 

is the vertex set of the other component of C — {e3,e4} and A4 C V(P(), 
i = 1,2, it follows that C displays (A% D ^2) I U ^2)- This concludes the 
proof in case £' is obtained from £ by a single application of 9m and thus 
the proof of the proposition. I 

Interestingly, the Z-rule does not preserve circularity in general. An 
example in point is the X-cycle C with X = {1, ... ,5} and the natural 
ordering of the elements of X as vertex ordering. Then the partial splits 
Si = 13|45 and £2 = 34| 25 are clearly displayed by C. Yet the Z-rule applied 
to {S 1 !,^} generates the partial splits 13|245 and 25 1 134 which cannot be 
displayed by C. 

5 Split closure sequences and split closures 

In this section, we associate to a set E of partial splits a split closure sequence 
and define the last element of such a sequence to be a split closure of E. 
We also establish a key result for this paper which shows that under certain 
circumstances a split closure is unique. 

5.1 Split closure sequences 

Suppose E € V{X) is a collection of partial splits that satisfies some partial 
splits property (P) such as, for example, weak compatibility and 6 is one 
of the closure rules considered in this paper. Following [21j . we associate a 
split closure sequence a and a split closure to E as follows. 

a : £0, Si, £2, . . . , Sj, Sj+i, • • • 

is a strictly increasing (with respect to ^) sequence of sets in V{X) so that 
E = £0 and, for all i > 1, Ej+i is obtained by one non-trivial application 
of 9 to £j whenever Ej satisfies (P). Note that since X is finite, there must 
exist a last element E n in a such that £ n either satisfies (P) and is closed 
under 9 or E n does not satisfy (P). In the latter case we reset £ n to be 
a new element u ^ V{X). We refer to a as a split closure sequence for E 
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and call n the length of a. In addition, we call the last element of a a split 
closure of E. Note that in case E n ^ u, 9 applies only trivially to S n . 
The following combinations of (P) and 9 are of interest to us: 

(a) (P) is the property that E is weakly compatible and 9 is the l"-rule. 

(b) (P) is unspecified and 9 is the M-rule. 

(c) (P) is the property that E is weakly compatible and 9 is the M/Y- 
combination closure rule 9m/y which applies 9m or 9y to E. 

To elucidate the notion of a split closure sequence and a split closure 
associated to a set in V(X) we next present an example for the assignments 
of (P) and 9 specified in (a). Consider the set X = {1,2,3,4,5} together 
with the collection S comprising of the partial X-splits S± = 12|34, S2 = 
23 1 14, S3 = 15|24, and S4 = 45| 13. Clearly, E is displayed by an X-cycle C 
with vertex ordering 1,2,3,4,5. Thus E is circular and so, by Lemma 14.21 
E is weakly compatible. Now 9y applied to {S\,S2, S3} generates the split 
5*3 = 15|234, 9y applied to {Si, 52,54} generates the split £4 = 45 1 123 and 
9y applied to {S2, S' 3 , S 4 } generates the split S' 2 = 145| 23. Since every subset 
of E' = {Si, S' 2 , S' 3 , S 4 } of size three contains two pairwise compatible full 
splits, 9y can only be applied trivially to E'. Hence, the sequence So = E, 
Ei = {Si, S2, S' 3 , S4}, E2 = {Si, S2, S' 3 , S 4 }, E' is a split closure sequence 
for E of length 3 and E' is a split closure for E. 

Regarding (c), it should be noted that even if for some E G V{X) two 
distinct split closure sequences have the same length and terminate in the 
same element E' 7^ u one of them might utilise fewer applications of 9y 
(and thus more applications of 9m than the other. For the previous 
example, one way to construct two such sequences is to exploit the following 
relationship between the Y-rule and the M-rule for {S'2, S' 3 , 5 4 }. 

Proposition 5.1 Suppose E = {Si = Ai\Ai : i = 1,2,3} <E V(X) is such 
that Ai C ^4 2 and A2 — Ai C A 3 C Ai U A2. If the Y-rule applies to E then 

9 Y (Z)- = {AiUA 2 \Ai,S 2 ,S 3 } = {S 3 }u9 M (Si,S 2 )-. 

Proof: Assume that E and Si and Ai, i = 1,2, 3, are such that the assump- 
tions of the proposition are satisfied. Then Ai (IA2 = 0. Combined with the 
assumption that 9y applies to E, it follows that either ([2]) is satisfied with 
Ai, i = 1, 2, 3 playing the roles of their namesakes in the statement of ([2]) or 
([2]) is satisfied with A 3 playing the role of ^3 and Ai playing the role of Ai , 
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i = 1,2, in that statement. But the latter alternative cannot hold since this 
implies A\ n A 2 D A3 7^ whereas the assumption A3 C A\ U A 2 implies 

Ai n A 2 n As c Ai n A 2 n (a[ u A^) = (Ai n A 2 n 31) u (Ai n A 2 n^) = 0. 

Hence, ([2]) is satisfied with Aj, i = 1, 2, 3 playing the roles of their namesakes 
in the statement of ([2]). Let S' i: i = 1, 2, 3 be as in the statement of the Y- 
rule. Then A 2 — A\ Q A3 implies S[ = A\ U A 2 |Ai. And since A\ Ci 2 , we 
have 5 2 = S" 2 and Ai n A 2 = which in turn implies S3 = S3. Consequently, 

M£)~ = {A 1 \JA 2 \A 1 ,S2,S 3 }. 

To observe the remaining set equality, note that since ([2]) is satisfied with 
Ai, i = 1,2,3 playing the roles of their namesakes in the statement of ([2]) 
neither A\ n A 2 nor A\ n A 2 can be empty. Let S[ be as in the statement of 
the M-rule. Then A\ C A 2 implies 5[ = Ai U A 2 |Ai and 5 2 = A x n A 2 |A 2 . 
This implies the sought after set equality and thus proves the proposition. I 

Clearly independent of which one of the rules 9y, 9m or 9m/y is applied, 
a split closure sequence must always be finite since X is finite. In addition 
and by applying the same arguments as Semple and Steel in [21] one can 
show that, for the assignments of (P) and 9 as described in (a), the length 
of a split closure sequence for a weakly compatible set £ 6 V(X) is bounded 
from above by |£| • \X\ — Ej^ m e x;|A U B\. 

5.2 Split closures 

We start with a lemma that is crucial for showing that the split closure of 
some collection E £ V(X) is unique in any of the three combinations for 
(P) and 9 stated in (a) - (c). 

Lemma 5.2 Suppose £ 6 V(X), E/w is a split closure o/£ and £ r and 
£ r +i are too consecutive elements in a split closure sequence for £. 

(%) /jf E r is weakly compatible, E r ■< E and E r+ i is obtained from E r 6y 
one application of 9y , then £ r +i is weakly compatible and £ r +i ^ E. 

(m) 7/ E r+ i is obtained from E r 61/ one application of 9m and E r H E ; 
iaen E r+ i ^ E. 

Proof: Suppose E, E, £ n £ r +i are as in the statement of the lemma. 
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(i) Assume E r+ i is obtained from E r by applying By to some set {S%, 5*2, 6*3} 
contained in E r . For i = 1,2,3 and with Ai G S% playing the role of their 
namesakes in the statement of ([2]), we obtain 

S' 3 = A^iA^M)^,^ = ^U( J 4 1 n^)|^, and S[ = iiu(l 3 nl 2 )|4i. 
It follows that 

S r +i = (E r U {S^, 5*2, 6*3}) . 

Since E r H E, there exist partial splits S" = A'-\A" £ E with S" extending 
Si, i = 1,2,3. Without loss of generality we may assume for all i that 
Ai C A" and A{ C A-'. Since E is weakly compatible, ((2]) is satisfied by 
{S'{, S 2 , S%} with A^' n A 2 ' n A 3 ' = 0. Since, by assumption, S/u and so 
6>y applies trivially to E, we must have A'/ n A 2 ' C A 3 ', A'/ n A 3 ' C A 2 , and 
A£r\A%QA!{. It follows that, for all » = 1, 2, 3, S[ is extended by S'( which 
in turn implies E r+ i H E. Since E 7^ cj and so E is weakly compatible, 
Lemma 14. 1 1 implies that E r+ i is weakly compatible. 

(ii) : Suppose E r+ i is obtained from E r by applying 6 m to some set 
{Si, S 2 } C E r . Put Si = Ai\Ai, i = 1,2, and assume without loss of 
generality, that A 1 n A 2 / and iini 2 ^ 0- Then 

5i = Ai n A 2 \A\ U A' 2 ' and S 2 = A\ U A 2 |3i n A' 2 ' 

and so 

E r +i = (E r U {S[, S 2 }) ■ 

By assumption, E r -< E and so there exist partial splits S'{ = A'(\A", S 2 = 
A'^A 1 ^ in E with S" extending Si, i = 1,2. Without loss of generality we may 
assume A4 C A'[ and Ai^£(,i = l, 2. Then A'[ n A' 2 ' / and 3f n A% ^ 0. 
Since, by assumption, S/w and so 6m only applies trivially to E we have 
0m(S'{, S 2 ) ^ E. Hence, there exist partial splits Sj = Aj\Aj G E, j = 3,4, 
so that A" n A 2 \A'{ U A% is extended by S3 and A'/ n ^'|^ U A\ is extended 
by S4. Without loss of generality, we may assume that A'{ n A 2 C A3 and 
A" U A 2 C ^ and that A" U A 2 ' C A A and A'/ n 2| C A4. Then 

Ai n A 2 C A" n A 2 ' Q A 3 aiidTiuT 2 QA{UA% QA% 

and so S3 extends S[. Similarly, it follows that S4 extends S' 2 . Thus, 
E r +i ^ E. I 

With this result in hand, we are now in the position to present a key 
result. 
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Theorem 5.3 Suppose E G V{X). Then any two split closures for E are 
the same if 

(i) E is weakly compatible and solely the Y -rule is used to obtain a split 
closure for £, 

(ii) solely the M-rule is used to obtain a split closure for £, or 

(Hi) E is weakly compatible and solely the M /Y -rule is used to obtain a 
split closure for E. 

Proof: Suppose E € V(X). We start with remarking that we prove State- 
ments (i), (ii) and (iii) collectively as the proof of all three statements relies 
on an inductive argument on the length of a split closure sequence for E. 
However, since the arguments for the inductive step differ under the assump- 
tions made in (i), (ii) and (iii), we discuss each inductive step separately. 

Suppose that the assumptions made in (i) or in (ii) or in (iii) hold. If 
every split closure of E is lo then the theorem holds trivially. So we may 
assume that there exists a split closure E of E with E ^ uj. We proceed 
by showing that every other split closure of E must equal E. Suppose that 
a : Eo = E, Ei, E2, . . . , E n is a split closure sequence of E. We now use 
induction on n to show that if E satisfies the assumptions made: 

in (i) then, for all i S {0, 1, . . . , n}, 

Ej is weakly compatible and Ej ^< £; (5) 

in (ii) then, for all ? € {0, 1, . . . , n}, 

E, r< S; (6) 

in (iii) then, for all i £ {0, 1, . . . , n}, 

Ej is weakly compatible and Ej ^ E. (7) 

We start with assuming that E satisfies the assumptions made in (i), 
that is, E is weakly compatible and solely the Y-rule is used to generate the 
elements of a. If i = then ([5]) obviously holds since then Ej = Eo and Eo 
satisfies the properties stated in ([5]). Now suppose that ([5]) holds for some 
i £ {0,l,...,n-l}. Then, by Lemma E^i), Ej+i is weakly compatible and 
Ej+i H E. This completes the induction step and thereby establishes ([5]). 
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Next, assume that only the M-rule is used to generate the elements in 
S, If i = 0, then ([6]) holds since then S, = So and So satisfies ([6]). Assume 
that © holds for some i G {0, 1, ... ,n — 1}. Then Lemma 15.2( h) implies 
S r +i ~< S which completes the induction step and thereby establishes ([6]). 

Finally, assume that S satisfies the assumptions made in (hi), that is, S r 
is weakly compatible and only the M/Y-rule is used to generate the elements 
in a. If i = 0, then ([7]) obviously holds since then Sj = So and So is closed 
under 8 M j Y . Now suppose that ([7]) holds for some i S {0, 1, . . . , n — 1}. 
Then Sj is weakly compatible, S r -< S, and one of the following two cases 
must hold. Either (a) S r+ i is obtained from S r by applying 8y or (b) S r+ i 
is obtained from S r by applying 8m- 

If Case (a) holds, the proof of the inductive step in (i) implies that Sj+i 
is weakly compatible and Sj+i ^ S. 

If Case (b) holds, Sj + i H S follows from the proof of the inductive step 
in (ii). That Sj + i is weakly compatible follows from Lemma 15.2( h) and the 
fact that S is weakly compatible. This completes the induction step and 
thereby establishes ([7J. 

We conclude with noting that for i = n, we obtain S n ^ S regardless 
of whether we are assuming (i) or (ii) or (hi) to hold. In case of (i) holding 
and applying §S§ to i = n or (hi) holding and applying (UJ) to i = n, we see 
that S n is weakly compatible. By interchanging the roles of S n and S, we 
deduce S ^ S n . Thus, under the assumptions made in (i) or (ii) or (hi), we 
have S = S n which concludes the proof of the theorem. I 

Extending in the case of (P) denoting the condition "S is weakly com- 
patible" and denoting either the y-rule or the M/Y-rule, the definition 
of the split closure to non weakly compatible sets in V(X) by defining the 
split closure of such sets to be u;, we obtain 

Corollary 5.4 Suppose 6 is either the Y - or M- or M/Y-rule. Then for 
any S £ V(X), any two split closures for S obtained via 8 are the same. 

Bearing in mind Corollary 15.41 we denote for 8 € {8y,8m,6m/y} the 
split closure of a set S E V(X) by (S)#, Note that |(S)£i y | < |S| but that 
neither \{T,)g R[ | < |S| nor |(S}0 M/y | < |S| have to hold. Also note that, if we 
denote the collection V(X)U{oj} by V W (X), define (u>}o = to for some closure 
rule 8, and put S ■< uj for all S € V(X), then the split closure with respect 
to 8 G {8y,&m,M/Y} satisfies the usual properties of a closure operation. 
More precisely, for all S,S' € V W (X) we have S ^ (S}#, if S -< S' then 
(S) e < (S') e , and (p) e ) = (S) e . 
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As an immediate consequence of Lemma |4,2| Proposition 14.41 and The- 
orem [531 we obtain our main result which we state next. 

Corollary 5.5 Suppose £ G V{X) and C is an X-cycle. Then £ is dis- 
played by C if and only if (£)# M . y is displayed by C. In that case (£}e y and 
(£)e M are also displayed by C. 

We conclude this section with remarking that, in general, not all elements 
in (E)g M/Y need to be full splits on X, £ € V(X) circular. However, it 
is reasonable to assume that those that have been extended to full splits 
on X contain phylogenetically relevant information and programs such as 
e. g. SplitsTree4 [12] may be employed to produce a circular phylogenetic 
network that displays them. For the following we refer to the combination 
of the M/Y-iule with a phylogenetic network generation package such as 
SplitsTree4 as the MY -closure approach. Although a detailed analysis of this 
approach is beyond the scope of this paper and will be presented elsewhere, 
we note that the MY-closure approach cannot be polynomial in the worst 
case since if the collection of partial splits comprises of all 3(^) partial X- 
splits A\B with \A\ = 2 = \B\ and n = \X\ then Om/y will generate all 
2(n-!) _ l splits of X 

6 An example: The ring of life 

One of the most fiercely debated questions amongst biologists is the origin 
of eukaryotes (essentially cells that have a nucleus and organelles) [16] . The 
main reason for this is that eukaryotes have eubakteria-like genes as well 
as archaebacteria-like genes making it very difficult to establish the evolu- 
tionary relationships between eukaryotes and prokaryotes (essentially cells 
that lack nucleus and organelles) which is the collective name for eubak- 
teria and archaebacteria. To help shed light into this question, Rivera et 
al. [20] analysed 10 bacterial genomes. The 5 most probable phylogenetic 
trees resulting from their analysis are presented in Fig. 1 of that paper. For 
the convenience of the reader, we depict them in slightly different form in 
Fig. [5j Note that the collection of splits displayed by these trees is circu- 
lar and also that, when ignoring the fact that the leaves are marked with 
different symbols, the last 2 trees are the same. 

Using a technique called Conditioned Reconstruction [20j, Rivera et al. 
constructed the phylogenetic network depicted in Fig. [6] with the degree 5 
interior vertex plus all its incident edges removed and all resulting degree 2 
vertices suppressed. The resulting structure they then interpreted as lending 
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Figure 5: Five most probable phylogenetic trees that appeared in slightly 
different form in [20\ Fig. 1] (cf text for details). 



support to the idea that, in its early stages, evolution was not tree- like but 
rather more like a ring (hence the term "ring of life") with the eukaryotic 
genome being the result of a fusion of 2 diverse procaryotic genomes |20j. 

To find out how dependent Rivera et al.'s ring of life is on the fact 
that all 5 trees are on the same leaf set, we randomly removed pairs of 
leaves plus their incident edges (suppressing resulting degree 2 vertices and 
always ensuring that there were no 2 trees from which the same pair of 
leaves was removed) resulting in 5 trees Ti, . . . , T5 on 5 leaves. Perhaps not 
surprisingly, we found that, in general, removal of pairs of leaves did not 
allow us to recover Rivera et al.'s ring of life. The exception being the trees 
depicted in Fig. [5] with the leaves marked by a filled-in square removed. For 
these 5 trees the associated phylogenetic network N{T\, . . . , T5) produced 
by the Ml^-closure approach is depicted in Fig. [B^left). 

In addition and with the exception of one instance where one split in 
E(Ti, . . . , T5), that is, the set of all splits displayed by T\, . . . , T5, was not 
extended to a full split by our closure rules and thus was not displayed by 
J\f(Ti, . . . , T5) our rules always generated a minimum collection of splits so 
that 7V(Ti, . . . , T5) displayed all the splits in £(Ii, . . . , I5). 

Interestingly, both the Z-closure super-network and Q-imputation ap- 
proach seemed to struggle with this example with, in the case of Z-closure 
super-network, either yielding a very complex network f\f(Ti, . . . , T5) in 
which numerous extensions of one and the same split in E(Ti, . . . , T5) was 
displayed (see Fig. fright)) or J\f{T\, ■ ■ ■ , T5) displayed only a subset of 
splits in S(Ti, . . . , T5) (Q-imputation). 

Acknowledgment The authors thank the referees for their helpful com- 
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Figure 6: Left, a circular network on 2 yeast genomes, an a-probacterium, a 
bacillus, a halobacterium, a methnaococcus, an ecocyte, and an archaeoglo- 
bium (the genome abbreviations follow [20]). It displays the split collection 
inferred from the collection of partial splits induced by the trees in Fig. [5] 
with the leaves marked with a square plus their incident edges removed and 
the resulting degree 2 vertices suppressed using the MY-closure approach. 
Right, the Z-closure super-network on the same set of partial splits 



ments. 
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